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Abstract 

The Bateson-Dobzhansky-Muller (BDM) model of reproductive isolation by genetic incompatibility is a widely accepted model of 
speciation. Because of the exceptionally rich biological information about the budding yeast Saccharomyces cerevisiae, the identi- 
fication of BDM incompatibilities in yeast would greatly deepen our understanding of the molecular genetic basis of reproductive 
isolation and speciation. However, despite repeated efforts, BDM incompatibilities between nuclear genes have never been identified 
between 5. cerevisiae and its sister species 5. paradoxus. Such negative results have led to the belief that simple nuclear BDM 
incompatibilities do not exist between the two yeast species. Here, we explore an alternative explanation that such incompatibilities 
exist but were undetectable due to limited statistical power. We discover that previously employed statistical methods were not ideal 
and that a redesigned method improves the statistical power. We determine, under various sample sizes, the probabilities of iden- 
tifying BDM incompatibilities that cause F1 spore inviability with incomplete penetrance, and confirm that the previously used samples 
were too small to detect such incompatibilities. Our findings call for an expanded experimental search for yeast BDM incompatibilities, 
which has become possible with the decreasing cost of genome sequencing. The improved methodology developed here is, in 
principle, applicable to other organisms and can help detect epistasis in general. 
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Introduction 

Speciation, the "mystery of mysteries" in Darwin's words 
(Darwin 1 859), is one of the most important processes in evo- 
lution, responsible for the generation of the tremendous 
biodiversity on Earth. Important as it is, speciation is not well 
understood at the genetic level. For example, it is unknown 
how many genetic changes underlie the formation of a new 
species in nature, and the relative roles of natural selection and 
genetic drift in causing these changes are debated (Schluter 
2009; Nei and Nozawa 201 1). A key step in speciation is the 
establishment of reproductive isolation, which can occur pre- 
or postzygotically (Coyne and Orr 2004). Genetic incompati- 
bility is thought to be the major cause of postzygotic isolation. 
Specifically, the Bateson-Dobzhansky-Muller (BDM) model 
asserts that a genetic change at locus A in one population 
and a genetic change at locus B in another population may 
be incompatible when residing in the same genome upon the 
hybridization between individuals of the two populations, 



which could result in postzygotic incompatibility and lead to 
inviability, infertility, or inferiority (Orr 1996). Although this 
model is generally accepted, only a small number of genes in 
a few species pairs have been identified to be genetically in- 
compatible (Wu and Ting 2004; Maheshwari and Barbash 
201 1 ; Nosil and Schluter 201 1). One classical example involves 
the melanoma formation in the hybrids of Xiphophorus spe- 
cies. Normally, the Tu locus controls the formation of spots 
composed of black pigment cells. In interspecific hybrids be- 
tween the platyf ish X. maculatus and swordtail X. helleri, these 
spots sometimes spontaneously develop into malignant mela- 
nomas (Wittbrodt et al. 1989). A two-locus BDM model can 
explain this phenomenon: overexpression of Tu, which has 
been identified to be Xmrk on the X chromosome, causes 
melanomas to form (Adam et al . 1 993), whereas an autosomal 
repressor gene mapped near cdkn2a/b negatively regulates Tu 
(Schartl et al. 2013). The hybrids that have Tu but not the 
repressor will develop melanomas (Meierjohann et al. 2004). 
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There is, however, much disagreement on the existence of 
such major BDM incompatibilities and their role in speciation 
in general (Liti et al. 2006; Maheshwari and Barbash 2011). 
Identifying such genes and studying their functions and evo- 
lution could help settle this debate and uncover the molecular 
genetic basis of reproductive isolation and speciation. Because 
BDM incompatibilities are expected to accumulate with the 
divergence of two species, identifying such incompatibilities 
from closely related species is most relevant to understanding 
speciation (Nosil and Schluter 201 1). 

For four reasons, the budding yeast Saccharomyces cerevi- 
siae (5c) and its sister species 5. paradoxus {Sp) appear to be 
ideal for identifying BDM incompatibilities and studying their 
roles in speciation. First, 5. cerevisiae is one of the best studied 
eukaryotes, with abundant information on its genetics, geno- 
mics, physiology, cell biology, and molecular biology. There 
are also numerous genetic tools and methods for studying 
yeast. Its short generation time allows rapid genetic analysis 
and its small genome (-12 million bases) makes genotyping 
and fine genetic mapping easier than in other species. Second, 
separated approximately 10 Ma (Kawahara and Imanishi 
2007) and with approximately 85% genome sequence iden- 
tity (Kellis et al. 2003), 5c and 5p are relatively closely related. 
The two species can readily mate with each other (Murphy 
et al. 2006); yet, their postzygotic isolation is strong, with 
Sc-Sp hybrids producing only approximately 1% viable 
spores (Hunter et al. 1996). Third, the genomes of the two 
species are essentially collinear with no gross chromosomal 
rearrangements and no reciprocal translocation; only four in- 
versions and three segmental duplications exist (Kellis et al. 
2003). This fact eliminates chromosomal rearrangement as a 
major contributor to their postzygotic isolation. Fourth, the 
genotypes and phenotypes of yeast haploids can be directly 
analyzed, avoiding the need to generate homozygotes from 
the spores produced by F1 hybrids. Note that F1 hybrids are 
not suitable for identifying genetic incompatibilities unless 
they are dominant, but a previous study has excluded the 
existence of dominant genetic incompatibilities underlying 
the infertility of the hybrid between 5c and Sp (Greig et al. 
2002). One complication of the yeast system is that a large 
fraction of spores produced by Sc-Sp hybrids are killed by 
aneuploidy (Hunter et al. 1996). At least one recombination 
is usually required for correct segregation of homologous 
chromosomes during meiosis. In the Sc-Sp hybrid, the se- 
quence differences between homologous chromosomes 
cause the mismatch repair system to suppress recombination, 
resulting in a high frequency of aneuploidy (Chambers et al. 
1 996). Deleting the mismatch repair gene MSH2 increases the 
recombination rate in the hybrid from 5.4 to 35.6 crossovers 
per meiosis (Kao et al. 2010). Consequently, F1 spore viability 
rises to approximately 10% (Kao et al. 2010). 

Research in the last decade has focused on understanding 
the genetic basis of Sc-Sp F1 hybrid infertility, which is equiv- 
alent to F1 spore inviability. Despite the multiple advantages of 



the study system and repeated efforts (Greig et al. 2002; Greig 
2007; Kao et al. 2010; Xu and He 201 1), no nuclear-nuclear 
genetic incompatibilities have been identified for Sc-Sp F1 
infertility, although a mitochondrial-nuclear incompatibility 
has been reported for F2 hybrid infertility (Chou et al. 
2010). Two general strategies have been used to identify nu- 
clear-nuclear genetic incompatibilities between 5c and Sp. 
The first approach is to replace chromosomes in 5c with 
their Sp homologs one at a time. If interchromosomal incom- 
patibilities exist, one would observe a reduction in strain 
fertility, viability, or growth rate upon a chromosomal replace- 
ment. The fact that such replacements were made for at least 
9 of the 16 chromosomes demonstrates the lack of BDM in- 
compatibility for F1 spore viability in the 9 chromosomes 
(Greig 2007). This result, however, does not exclude the pos- 
sibility of incompatibilities for F1 spore growth rate or higher 
order incompatibilities for viability. Note that even when an 
interchromosomal incompatibility is detected using this ap- 
proach, further work is needed to localize the incompatible 
genes. 

The second approach is to identify genetic incompatibilities 
in F1 spores by linkage analysis. Briefly, if the 5c allele at locus 
A {A Sc ) is incompatible with the Sp allele at locus B (B Sp ), spores 
of the genotype A 5c B 5p may have reduced viability and thus 
may be underrepresented among viable F1 spores. This de- 
crease in frequency also applies to pairs of markers closely 
linked to A Sc and B 5p , respectively. Thus, it is possible to use 
existing genetic markers such as single nucleotide differences 
(SNDs) between the two species to map BDM incompatibili- 
ties. This approach is virtually identical to mapping genetic 
interaction or epistasis. Because of the large number of 
marker pairs to be tested, the statistical power is expected 
to be low. 

Two groups have used the above second approach to look 
for incompatibilities between 5c and Sp that kill F1 spores with 
100% penetrance, but with no success (Kao et al. 2010; Xu 
and He 201 1). The negative result has led to the suggestion 
that two-locus BDM incompatibilities do not exist in yeast and 
are unimportant to yeast speciation (Kao et al. 2010). 
However, for two reasons, genetic incompatibility need not 
have 100% penetrance. First, an incompatibility may only in- 
crease the probability of spore inviability rather than killing the 
spore deterministically, because spore viability is likely to be a 
complex trait controlled by multiple genes. Second, a high- 
order incompatibility behaves like a two-locus incompatibility 
with incomplete penetrance. For instance, a three-locus in- 
compatibility with 100% penetrance behaves exactly as a 
two-locus incompatibility with 50% penetrance. Given the 
possibility of incomplete penetrance, one wonders what con- 
clusion about the genetic incompatibility between 5c and 
Sp can be drawn from the existing data of the linkage analysis. 
To answer this question, it becomes necessary to understand 
the properties of this linkage analysis. Here, we use computer 
simulation to inspect the statistical properties of the linkage 
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analysis, under the scenario that two-locus genetic incompat- 
ibilities cause F1 spore inviability with incomplete penetrance, 
which, as aforementioned, includes the possibility of multiple- 
locus incompatibility. We show that the previously designed 
statistical method is not ideal and propose a modified method 
that improves the statistical power. We find previously used 
sample sizes too small to detect genetic incompatibilities and 
offer guidelines for future experimental searches of the BDM 
incompatibilities between 5c and 5p. These results may apply 
to the study of BDM incompatibilities in other species and 
more generally to epistasis mapping. 

Materials and Methods 

General Strategy of Simulating the Identification of BDM 
Incompatibilities 

Based on theoretical predictions and experimental results 
(Welch 2004; Wu and Ting 2004; Lee et al. 2008), we 
assume that genetic incompatibilities are asymmetric. That 
is, if A 5c and B 5p are incompatible, A 5p and B 5c can still be 
compatible (fig. ^A). We define / as the probability that an 
F1 spore dies due to an incompatible allelic pair. We consider 
the use of msh2 mutants of both 5c and 5p in this study (Kao 
et al. 2010) such that spore deaths have three potential 
causes: random death, aneuploidy, and genetic incompatibil- 
ity. Random death refers to spore death caused by deleterious 
mutations, meiotic errors, or environmental factors, and is 
assumed to have the same rate in the parental species and 
their hybrid. 

The following steps outline the procedure of simulating 
spore production (fig. IB). First, to simulate the hybridization 
between the two yeast species, we set the in silico genome to 
contain 16 chromosomes with lengths following those of Sc. 
SND density was set to be one per seven nucleotides based on 
the 85% sequence identity between the two species. We 
assume N pairs of incompatibilities and randomly assign 
them to the existing SNDs. The effects of these N pairs of 
incompatibilities on F1 spore inviability were either set to be 
equal or set to follow a certain distribution. The number of 
crossovers generated during the meiosis of F1 hybrids 
followed a Poisson distribution with a mean of 35.6 per mei- 
osis (Kao et al. 2010) and the crossovers were randomly 
assigned to the genome. Meiotic gene conversion and variable 
recombination rates across the genome are not considered. 
After meiosis, four spores are generated. We then calculate 
spore viability as described in the next section and stochasti- 
cally determine viable spores based on their viabilities. 

In actual experiment, the viable spores may be genotyped 
by restriction enzyme digestion (Xu and He 201 1), microarray- 
based SND typing (Kao et al. 2010), or genome sequencing. 
Here, we use 1,207 SNDs (1 per 1 0 kb) as markers in linkage 
analysis. Using more markers does not improve the precision 
or power of identifying BDM incompatibilities because of 



limited recombination in msh2 Sc-Sp hybrids: 10,000 
nucleotides correspond to 1.5cM. Using 1 marker per 
1 0 kb means that the expected mapping resolution is at best 
2.5 kb. 

Our preliminary analysis revealed that any BDM incompat- 
ibility between two intrachromosomal loci is difficult to detect 
due to strong linkage. Hence, we examine the frequencies of 
spores for every pair of interchromosomal SND markers. That 
is, for markers A and B that are located on different chromo- 
somes, we obtain the numbers of spores with the genotypes 
of AsSsc (a), A 5p B 5c (b), A 5c Bs pi (c), and A 5p B Sp (d), respec- 
tively. These numbers form a 2 x 2 table (fig. 1 0, from which 
three statistics are calculated: chi-squared value, G test statis- 
tic, and odds ratio (OR) (discussed later). Because of viability 
differences among the four genotypes, the incompatible ge- 
notype should have a reduced frequency, compared with its 
expected value. 

In theory, when the sample size is sufficiently large, we 
should be able to recover the pre-assigned incompatible allelic 
pairs. After acquiring a statistic of genetic incompatibility for 
each pair of markers, we determine statistical significance 
using a familywise 5% type I error rate (discussed later). We 
then attempt to estimate the chromosomal segments encom- 
passing the incompatibility genes (discussed later). 

Calculating Spore Viability 

In our simulation, random death, aneuploidy, and BDM in- 
compatibility are three causes of F1 spore inviability. We set 
the random death rate to be R= 1-0.804 = 0.196, based on 
the fact that 5. cerevisiae and 5. paradoxus msh2 mutants 
have spore viabilities of 84.0% and 80.4%, respectively 
(Hunter et al. 1996). It has been estimated that aneuploidy 
occurs at a frequency of 0.29 per viable msh2 Sc-Sp hybrid 
spore (Kao et al. 2010), but it is unknown what the corre- 
sponding fraction is in dead spores. The impact of aneuploidy 
on spore viability is complicated. Although loss of a chromo- 
some is lethal, gain of an extra chromosome could be 
beneficial if it masks the deleterious effect of genetic incom- 
patibility. We set the probability of spore inviability due to 
aneuploidy to be either U=0% or 50% to obtain a minimal 
and a more realistic estimate of the required sample size for 
identifying BDM incompatibilities, respectively. Inviability 
caused by aneuploidy is applied to pairs of sister spores be- 
cause nondisjunction typically occurs in meiosis I of the hybrid 
(Hunter et al. 1996). We assume no epistasis among incom- 
patible gene pairs. Let T be the fraction of viable spores pro- 
duced by F1 hybrids, N be the number of BDM incompatibility 
pairs between 5c and 5p, and l k be the probability of spore 
death caused by the /cth pair of incompatibility or penetrance. 
We have 

N 

r = (1 -U)Y\ [0.75 + 0.25(1 -/*)]. (1) 
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A S. cerevisiae S. paradoxus 




Fig. 1. — General strategy of simulating the identification of BDM incompatibilities between Saccharomyces cerevisiae (5c) and 5. paradoxus (5p). (A) The 
5c allele at locus A and the 5p allele at locus B are incompatible, leading to reduced viability when in the same spore. (B) Procedure for detecting BDM 
incompatibility between 5c and 5p. (0 A 2 x 2 table for spore counts of each marker pair. Several statistics for genetic incompatibility are computed using 
these counts. 



In the simple case of l k = I for all N incompatible pairs, we have 
T = (1 - /?)(1 - LO[0.75 + 0.25(1 - /)f. (2) 



Statistics Characterizing Genetic Incompatibility 

Genetic incompatibility between A Sc and B 5p leads to a reduc- 
tion in the frequency of A Sc Bs pi compared with its expected 
value. This signal can be detected in multiple ways. Because of 
strong linkage within a chromosome, we only evaluate pairs 
of markers that reside on different chromosomes. In a previ- 
ous study (Kao et al. 201 0), a chi-squared test was used to test 
whether the frequency of a recombinant equals the product 
of corresponding allele frequencies. For example, if the A 5c 
and B 5c frequencies among viable F1 spores are 0.3 and 0.5, 
respectively, the expected frequency of viable A 5c B Sc spores is 
0.3 x 0.5 = 0.15. Chi-squared is then calculated by summing 
over all genotypes the squared difference between the ex- 
pected and observed numbers of a genotype divided by the 



expected number. This test is nondirectional in the sense that 
it does not distinguish whether the recombinants are overrep- 
resented or underrepresented. Besides the chi-squared test, 
the G test of independence may be used to test the goodness 
of fit of the observed genotype frequencies to their expected 
values. G test is designed for cases where the margins of a 
2x2 table are not fixed by investigators whereas the total 
number in the four cells of the table is fixed (Sokal and Rohlf 
1995). We conduct the G test with Williams's correction 
(Sokal and Rohlf 1995). In addition, we calculate an OR by 
dividing the product of the numbers of the two parental ge- 
notypes by that of the two recombinant genotypes: 
OR = (axd)/(faxc)(fig. 10- 

Because multiple pairs of markers are tested in an experi- 
ment, we evaluate the significance of the earlier statistics by 
controlling the familywise type I error rate. We first randomly 
shuffle each of the 16 chromosomes among spores and then 
find the highest statistic among all pairs of markers. We con- 
duct this shuffling 100 times and rank the resulting 100 
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highest statistics. The 5th largest number among these 100 
numbers is chosen as the critical value corresponding to a 
familywise type I error rate of 5%. 

After applying the cutoff, we group statistically significant 
pairs of markers as follows. Let us use the OR as an example, 
but the same procedure applies to the other statistics used. 
First, we find the maximal OR, and take a step of seven mar- 
kers on each side of each focal marker to obtain the initial 
square of close linkage. The number seven is chosen by con- 
sidering the tradeoff between grouping markers showing sig- 
nals of different incompatibilities and dividing markers 
showing the signal of the same incompatibility. Second, we 
keep expanding the square with a step size of one marker 
until it is no longer significant or it reaches an end of the 
chromosome. Third, if two squares overlap with each other, 
we ignore the square with the lower maximal OR. Fourth, we 
repeat these steps until all significant pairs of markers are in- 
cluded in the squares. Fifth, the marker pair of the maximal OR 
of each square is recorded. If two adjacent marker pairs in the 
same square tie for the maximal OR, we record the locations 
of their midpoints. 

A preassigned BDM incompatible pair is considered to be 
correctly identified when both causal SNDs are within seven 
markers from the maximum in an aforementioned square. 
Sensitivity is calculated as the fraction of true incompatible 
pairs identified. False discovery rate is calculated as the total 
number of false discoveries divided by the total number of 
discoveries. When no discovery is made in all simulations, 
false discovery rate is defined as 0. Genomic distance is calcu- 
lated as the average distance between the two identified 
markers and their respective causal SNDs. Standard errors of 
sensitivity, false discovery rate, and genomic distance esti- 
mates are estimated using 1,000 bootstrap samples. 

Results 

OR Outperforms Other Statistics in Identifying Genetic 
Incompatibility 

Following Kao et al. (2010), we use msh2 mutants of 5c and 
Sp in our simulation of identifying BDM incompatibilities, 
unless otherwise noted. Based on theoretical predictions and 
experimental results (Welch 2004; Wu and Ting 2004; Lee 
et al. 2008), we assume that genetic incompatibility is asym- 
metrical. That is, if A 5c and B 5p are incompatible, A Sp and B 5c 
can still be compatible (fig. 1/\). It is difficult to detect BDM 
incompatibility between two loci that reside in the same chro- 
mosome because of limited recombination in the hybrid. 
Hence, we only examine pairs of markers located on different 
chromosomes. That is, for markers A and B that are located on 
different chromosomes, we obtain the numbers of spores 
with the genotypes of A Sc B 5c (a), A 5 pB 5c (b), A 5c B 5 p (c), and 
A spB 5 p (d), respectively, which form a 2x2 table (fig. 1Q. 
Because of viability differences among the four genotypes, 



the incompatible genotype should have a reduced frequency, 
compared with its expected value (fig. \B). In theory, when 
the sample size is sufficiently large, we should be able to 
detect such incompatible allelic pairs. 

We calculate three statistics using the 2x2 table: chi- 
squared, G test statistic, and OR = (a x cf)/{b x c) (see 
Materials and Methods), and evaluate their relative perfor- 
mances in identifying preassigned incompatibilities by simula- 
tion. The chi-squared statistic was previously used in this 
context (Kao et al. 2010), but this statistic does not differen- 
tiate between overrepresentation and underrepresentation of 
a genotype relative to its expectation and thus may be less 
specific. Because chi-squared test is an approximation of the G 
test, they have similar properties, although G test may be 
more precise. By contrast, a lower-than-expected OR indicates 
overrepresentation of A Sp B Sc and/or A 5c B 5pi whereas a higher- 
than-expected OR indicates depletion of these genotypes, 
which is predicted under genetic incompatibility. After acquir- 
ing a statistic of genetic incompatibility for each interchromo- 
somal marker pair, we determine statistical significance using 
a familywise 5% type I error rate to control multiple testing. 
We then identify the chromosomal segments that are likely to 
encompass the incompatibility genes (see Materials and 
Methods). 

Because the incompatible marker pairs are preassigned in 
the simulation, we can evaluate how well the three statistics 
perform in terms of the 1) sensitivity, 2) false discovery rate, 
and 3) mean genomic distance between the identified mar- 
kers and the preassigned incompatible SNDs. For each param- 
eter set, we conduct 400 simulation replications and pool the 
data in our analysis. Sensitivity is the fraction of all preassigned 
incompatible pairs that are recovered by the analysis. False 
discovery rate is the number of false discoveries divided by 
the total number of discoveries. The standard errors of 
these estimates are estimated by bootstrapping the pooled 
data 1,000 times. There are 12.07 million nucleo- 
tides x 1 5% = 1 .8105 million SNDs between Sp and Sc. We 
randomly assigned N pairs of SNDs to form N incompatibility 
pairs. In mapping these incompatibilities, however, we use 
only 1,207 markers, or 1 marker per 10,000 nucleotides, be- 
cause the use of more markers does not increase mapping 
resolution due to limited recombination (see Materials and 
Methods). 

We start the simulation with the following parameters. We 
assume no contribution of aneuploidy to spore inviability, and 
set A/ = 1 0 pairs of incompatibilities that have equal effects on 
inviability. Given the known viability of msh2 hybrid spores, 
the 10 pairs each contribute /= 0.75 to spore inviability. That 
is, a spore with one pair of incompatibility is 25% as viable as a 
spore without any incompatibility. The 10 pairs of incompat- 
ibilities (i.e., 20 causal SNDs) are randomly distributed in the 1 6 
yeast chromosomes. The number of viable spores genotyped 
is A//=200. When OR is used, the sensitivity is 40%, signifi- 
cantly greater than that of chi-squared (28%) or G test statistic 
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(30%) (fig. 2A). The false discovery rate under OR is 24%, not 
significantly different from that under the other two statistics 
(22% and 23%, respectively) (fig. 2B). The mean genomic 
distance between the identified marker and the preassigned 
incompatibility loci is 1 8.3 kb under OR, significantly smaller 
than that under the other two statistics (19.3 and 19.1 kb, 
respectively) (fig. 2Q- 

If the differences among the three methods are simply due 
to the fact that chi-squared and G test statistic cannot distin- 
guish whether parental or nonparental types are in excess, we 
could use the directional information from OR and consider 
only those chi-squared or G test statistic values when OR > 1 . 
Although such modified chi-squared and G test statistic out- 
perform their original versions in sensitivity, they are still worse 
than OR (fig. 2A). In terms of the false discovery rate, the 
modified versions appear worse than the original versions 
(fig. 2B). In terms of the genomic distance, the modified ver- 
sions are similar to the original versions (fig. 2Q. We subse- 
quently confirmed the advantage of OR over chi-squared and 
G test statistic in multiple conditions, by varying N, M, and the 
influence of aneuploidy (U) (table 1). When genetic incompat- 
ibility is symmetrical, however, the advantage of OR over chi- 
squared and G test statistic disappears (supplementary table 
S1, Supplementary Material online). 

Previous Studies Were Underpowered 

To understand why previous experimental searches of nuclear 
BDM incompatibilities between 5c and 5p were unsuccessful, 
we perform a simulation following the scheme of a previous 
experiment study, which genotyped 58 spores from F1 with 
MSH2 and 48 spores from F1 lacking MSH2 (Kao et al. 201 0). 
Before we started the simulation, we confirmed that no pair of 
markers in that study (Kao et al. 2010) showed significant OR 
using our methodology. The simulation parameters used for 
msh2 spores are the same as described earlier. For mismatch 
repair proficient spores, random death rate is set to be 
/? = 0.05 (Greig et al. 2002). Given the observed viability of 
1 % among these spores, the contribution of aneuploidy to 
spore inviability (U) is calculated using equation (2) to be 
91.54% and 95.77%, for the corresponding numbers of 
0% and 50% in msh2 spores, respectively. To be consistent 
with the previous study (Kao et al. 201 0), we used the density 
of 1 marker per 2 kb. Using 1 marker per 1 0 kb yielded similar 
results. 

Assuming different pairs of incompatibilities in the simula- 
tion, we calculate the corresponding probabilities of nondis- 
covery, which is the probability that no marker pair has an OR 
that deviates significantly from the expectation at the family- 
wise 5% level. We first assume equal effects on spore viability 
from all pairs of incompatibilities. When aneuploidy does not 
reduce msh2 spore viability, at least 8 pairs of incompatibilities 
are required to explain the observed spore inviability. We 
found the probability of nondiscovery to exceed 5% in all 
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Fig. 2. — Performances of OR, chi-squared, and G test statistic for 
detecting BDM incompatibilities. Data shown are from 400 simulations 
of 1 0 incompatible pairs with equal / and no contribution of aneuploidy to 
spore inviability. The sample size is 200 viable spores. OR, % 2 , and G rep- 
resent odds ratio, chi-squared, and G test statistic, respectively, x 2 * and G* 
respectively consider % 2 and G only when OR > 1 . Standard error, shown 
by error bars, is estimated by 1 ,000 bootstrap replications. (A) Sensitivity of 
the five tests. P values are from paired f test. (£) False discovery rates of the 
five tests. (O Average genomic distance between preassigned incompat- 
ibilities and the identified significant markers. 



cases except when A/=8 (fig. 3A). If aneuploidy reduces 
msh2 spore viability by 50% and correspondingly reduces 
the viability of MSH2 spores, there should be at least 5 pairs 
of incompatibilities. Under this assumption, we found the 
probability of nondiscovery to exceed 0.05 in all cases 
except when A/= 5 (fig. 3B). Thus, it is possible for the previous 
experiment to have missed all incompatibilities. Our analysis 
tends to overestimate the power of the previous study, be- 
cause segments in spores with aneuploidy were ignored in the 
experimental study (Kao et al. 2010) such that the actual 
sample size is smaller than the number of sampled spores. 
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Note. — The results are from 400 simulations for each parameter set. 

Probability of aneuploidy-induced inviability. 

b Number of pre-assigned BDM incompatibility pairs. 

Probability of spore death caused by each pair of incompatibility. 

d Total number of genotyped spores. 

e Odds ratio. 

V statistic. 

9 G test statistic. 

\ 2 statistic only when OR> 1. 

'G test statistic only when OR> 1. 

*P<0.05 when comparing the performance of a statistic with that of OR by a paired ttest. 
**p< 0.005 when comparing the performance of a statistic with that of OR by a paired t test. 



Furthermore, we have not considered genotyping errors, 
which would further decrease the statistical power. It might 
seem counter-intuitive that the more pairs of genetic incom- 
patibility there are, the more difficult it is to identify any of 
them. The underlying reason is that the total contribution of all 
incompatibility pairs on inviability is fixed in this simulation and 
that all pairs are assumed to contribute equally. Thus, having a 
larger number of incompatible pairs means a smaller contri- 
bution from each pair. 

Because multiple pairs of genetic incompatibility are 
unlikely to have equal effect sizes on spore viability, it would 
be more realistic to consider unequal effect sizes. The diffi- 
culty, however, is that there is no prior knowledge on the 
effect size distribution. Because BDM incompatibilities may 
be similar to loss-of-function mutations (Maheshwari and 



Barbash 2011), we assume that the effect size distribution 
follows the distribution of the deleterious fitness effects of 
single-nonessential-gene deletions in yeast (Qian et al. 
2012). We randomly sample / from this distribution until the 
total incompatibility explains the observed spore inviability. 
The mode of the number of incompatible pairs required to 
explain the observed spore inviability is 150 (fig. 30 and 100 
(fig. 3D) when the contribution of aneuploidy to msh2 spore 
inviability is 0% and 50%, respectively. The corresponding 
distributions of / under the two scenarios used in this simula- 
tion study are presented in figure 3Cand D, respectively, and 
the probability of nondiscovery is 79% (fig. 3A) and 77% 
(fig. 3B), respectively. 

Because the study by Kao et al. (2010) was the largest 
experiment for identifying BDM incompatibilities between 
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Fig. 3. — Sample size in Kao et al. (201 0) is too small to detect BDM incompatibilities with incomplete penetrance. Data shown are from 200 simulations 
for each parameter set used. {A) Probability of nondiscovery in a study by Kao et al. (2010) when aneuploidy is assumed to cause no msh2 spore inviability 
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cause U = 50% inviability to msh2 spores. White bars show the results for incompatibilities with equal effects, whereas the gray bar shows the result for 1 00 
incompatibility pairs with unequal effects as described in (D). (O Distribution of the effect sizes (i.e., penetrances) of 150 BDM incompatibility pairs (under 
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5c and 5p, our results suggest that none of the previous stud- 
ies on the subject were sufficiently powerful to detect BDM 
incompatibilities between the two yeasts. 

Sample Sizes Required for Identifying BDM 
Incompatibilities 

How many viable spores should be genotyped to identify BDM 
incompatibilities with a reasonable success rate? Here, we 
again assume the exclusive use of msh2 strains in the exper- 
iment. Under the assumption of no effect from aneuploidy on 
viability, we examine the sceneries of N = 8, 10, and 15 in- 
compatible pairs with equal effects, respectively. We use the 
sample size of M= 100, 200, 400, and 800 spores, respec- 
tively. In the case of A/=8, the probability of nondiscovery is 
negligible even when M= 100 (fig. 44). In the case of N= 10 
and 15, the probability of nondiscovery declines quickly as 
M increases from 100 to 200 and 400 (fig. 4/\). As expected, 
the total number of discoveries increases with the sample size 
M (fig. AB), so does the sensitivity (fig. 40- By contrast, the 



false discovery rate (fig. 4D) and the mean genomic distance 
between the causal SNDs and the identified markers (fig. 4E) 
generally decline with M. We also examined the situation 
when the probability of msh2 spore inviability due to aneu- 
ploidy is 50% and obtained overall similar results (fig. A-F-J). 
Figure 5 shows randomly picked examples of our simulation 
results under various M when N is fixed at 10 and U at 0. 
Because one incompatibility pair happens to reside on the 
same chromosome, the maximal number of pairs detectable 
is 9. It is clear how increasing the sample size increases the 
power of detection. Similar patterns can be seen when 
U=0.5 (supplementary fig. S1, Supplementary Material 
online). 

To obtain a more realistic estimate of the required sample 
size for detecting incompatibilities, we use the aforemen- 
tioned unequal effect sizes depicted in figure 3C and D, 
respectively. Because, under this model, most incompatibilities 
have small effects, which are hard to detect, we focus on 
incompatibilities with />0.2 and its subset that has />0.4, 
respectively, when evaluating sensitivity, false discovery rate, 
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and genomic distance. The probability of nondiscovery, how- 
ever, is evaluated as originally defined. As aforementioned, 
when there is no contribution of aneuploidy to msh2 spore 
inviability, 1 50 incompatibility pairs are required to explain the 
observed spore inviability. Among them, 10 pairs have /> 0.2, 
four of which have />0.4 (fig. 30- When there is a 50% 
contribution of aneuploidy to msh2 spore inviability, 100 in- 
compatibility pairs are required to explain the observed spore 
inviability. Among them, six pairs have /> 0.2, two of which 
have />0.4 (fig. 3D). Our simulation (fig. 6) shows that a 
much larger sample is required for successful detection of 
BDM incompatibilities under unequal effect sizes than under 
equal effect sizes. For example, when M= 1,600, the proba- 
bility of nondiscovery becomes negligible (fig. 6A and £). With 
such a large sample, the sensitivity is approximately 40% for 



/> 0.2 and approximately 80% for /> 0.4 (fig. 6B and /=) and 
the false discovery rate is approximately 30% for /> 0.2 and 
approximately 50% for /> 0.4 (fig. 6Cand G). The mean ge- 
nomic distance is between 1 5 and 20 kb for both /> 0.2 and 
0.4, respectively (fig. 6D and H). 

Discussion 

In this study, we demonstrate that OR outperforms chi- 
squared and G test statistic in detecting asymmetrical BDM 
incompatibility through linkage analysis. Our simulation sug- 
gests that the existence of two-locus BDM incompatibility be- 
tween 5c and 5p cannot be excluded and its nondiscovery in 
previous yeast experiments could be due to the limited sample 
size and low statistical power. Our study provides important 
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Fig. 5. — An example showing the benefit of using large samples in 
identifying genetic incompatibilities. (A) Genomic positions of 10 pairs of 
randomly placed equal-effect genetic incompatibilities in the simulation. 
Genomic positions are defined by marker numbers on both axes. Note that 
one pair of incompatibility near marker 1,200 on both axes are located in 
the same chromosome and therefore are undetectable in our study be- 
cause only interchromosomal marker pairs are examined. Color shows the 
expected OR. Spore viability is assumed to be immune to aneuploidy. (B, D, 
F, H) ORs for all interchromosomal marker pairs when the sample size 
(number of viable msh2 spores genotyped) is (B) 100, (D) 200, (F) 400, 
and (H) 800, respectively. Color shows the observed OR (OR < 1 is not 
shown). (C, E, G, I) Interchromosomal marker pairs whose OR values are 
significant, when the sample size is (O 100, (£) 200, (G) 400, and (I) 800, 
respectively. The identified incompatibilities are circled, with the correct 
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incompatible pair is considered to be correctly identified only when both 
loci of a preassigned pair are within 7 markers (i.e., 70 kb) from an iden- 
tified OR peak. Xand / labels in (B-f) are the same as in (A). 



guidelines for designing experiments for identifying yeast 
BDM incompatibilities and for interpreting potential experi- 
mental outcomes. More generally, it highlights the impor- 
tance of understanding the statistical properties of an 



experimental method (e.g., sensitivity and false discovery 
rate) to use it efficiently and interpret the result correctly. 

We made several assumptions in our simulation that are 
worth discussion. First, for simplicity, we assumed that recom- 
bination rates are equal throughout the genome and ignored 
recombination hot/cold spots and interferences between 
crossovers (Mancera et al. 2008). This assumption should 
not affect the overall results because of the relatively low 
marker density used (1 per 10 kb). But recombination rate 
variation would make the genomic distances between the 
causal SNDs and the identified markers more variable across 
the genome. Second, due to the lack of prior knowledge on 
the distribution of /, we assumed either equal / values for dif- 
ferent incompatibility pairs or unequal / values that follow a 
specific distribution mimicking the fitness effects of gene de- 
letions. We believe that the result from the unequal / are closer 
to the truth than that from the equal /. Third, we assumed that 
BDM incompatibility is asymmetrical, which is in accordance 
with the theory and most of the incompatible pairs identified 
so far (Wu and Beckenbach 1983; Meierjohann et al. 2004; 
Welch 2004). Nevertheless, our test still works even when it is 
symmetrical (supplementary table S1, Supplementary Material 
online). Fourth, it is unclear how much aneuploidy affects vi- 
ability in msh2 spores, and we used 0% and 50%, respec- 
tively, in our study to have a sense of the range of possible 
outcomes. Fifth, we assumed no error in genotyping the 
spores. Although genotyping errors would reduce the statis- 
tical power, we expect the genotyping error rate to be low, 
especially when high-coverage next-generation DNA sequenc- 
ing is used. Moreover, due to low recombination rates, nearby 
SNDs can be used for correction of sequencing errors at spe- 
cific positions. Sixth, we did not explicitly study high-order 
incompatibility, but because high-order incompatibility is 
equivalent to two-locus incompatibility with incomplete pen- 
etrance, our results apply to high-order incompatibility. For 
example, /= 0.5 in a two-locus incompatibility (fig. 3) is equiv- 
alent to a three-locus incompatibility with 100% penetrance. 

In our simulation, we used 1 marker per 1 0 kb to look for 
BDM incompatibility. Although next-generation sequencing- 
based genotyping will offer much more markers, the extra 
markers do not enhance the mapping resolution, because 
the low recombination rate in msh2 F1 makes all markers 
within a 1 0 kb segment almost completely linked. Because 
of this property, pairs of incompatible genes that are located 
in the same chromosome are difficult to detect and therefore 
are not examined in our simulation. Intrachromosomal incom- 
patible gene pairs are expected to constitute only 7.54% of all 
incompatible pairs if incompatibility genes are uniformly dis- 
tributed in the genome. 

We found that, by the current method, much larger sam- 
ples than previously used are required for identifying yeast 
BDM incompatibilities with incomplete penetrance. Given 
the rapid increase in DNA sequencing capacity and decline 
in sequencing cost, genotyping approximately 1,000 spores 
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Fig. 6. — Genotyping more F1 spores improves the efficiency of identifying BDM incompatibilities with unequal effect sizes. (A) Probability of non- 
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figure 3D. Data shown are from 200 simulations per parameter set. Error bars show standard errors estimated from 1,000 bootstrap samples. 



is no longer out of reach. In fact, a recent study sequenced the 
genomes of 1 ,000 F2 individuals from a genetic cross between 
two yeast strains in order to map quantitative traits (Bloom 
et al. 2013). Our simulation shows that by genotyping 800 to 
1,600 F1 spores, there is a reasonable chance of identifying 
genetic incompatibilities with relatively high penetrance 
(>20%). 

Given the power of today's DNA sequencing capacity, an 
alternative strategy of identifying BDM incompatibility may be 
used. This strategy involves two steps. First, because an incom- 
patibility allele (e.g., A Sc in fig. 1A) has a fitness of 1-0.25/, 
relative to its alternative (e.g., A Sp ), it is relatively easy to iden- 
tify it by sequencing a pool of viable F1 spores en masse. 
Second, after identifying low-fitness alleles, one can then 



look for their incompatible partners by sequencing individual 
spores. Because of the reduced number of marker pairs to be 
tested, the sample size required in the second step will be 
much smaller. A critical requirement in this design is to min- 
imize the competition among spores in mitotic growth before 
sequencing them en masse, because allelic differences in 
growth rate between 5c and Sp that are unrelated to the 
incompatibility for spore viability may be common. 

Although 5c and Sp are used here to parameterize our 
simulation study, our methodology and results are useful for 
mapping recessive genetic incompatibilities in other species 
when the haploid stage can be assayed, including species 
with haplontic or haploid— diploid life cycles and diplontic spe- 
cies that can undergo homozygous diploidization. Because 



Genome Biol. Evol. 5(7): 1 261-1 272 doi:10.1093/gbe/evt091 Advance Access publication June 6, 2013 



1271 



GBE 



BDM incompatibility is a type of epistasis, our methods and 
results also apply in genomic detection of epistasis. 

Supplementary Material 

Supplementary figure S1 and table S1 are available at 
Genome Biology and Evolution online (http://www.gbe. 
oxfordjournals.org/). 
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